Noisy Subsequence Recognition Using Constrained String Editing Involving Arbitrary Operations*
نویسندگان
چکیده
We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We study the problem of estimating X* by processing Y, a noisy version of U. Y contains substitution, insertion, deletion and generalized transposition errors -the latter occurring when transposed characters are themselves subsequently substituted, as is typical in cursive and typewritten script, in molecular biology and in noisy chain-coded boundaries. We do this by defining the constrained edit distance between X ∈ H and Y subject to any arbitrary edit constraint involving the number and type of edit operations to be performed. In this paper we present the first reported solution to the analytic problem of achieving constrained editing of one string to another using these four edit operations. An algorithm to compute this constrained edit distance has been presented. Using these algorithms we present a syntactic Pattern Recognition (PR) scheme which corrects noisy text containing all these types of errors. Experimental results which involve strings of lengths between 40 and 80 with an average of 30.24 deleted characters and an overall average noise of 68.69 % demonstrate the superiority of our system over existing methods.
منابع مشابه
Noisy Subsequence Recognition Using Constrained String Editing Involving Substitutions, Insertions, Deletions and Generalized Transpositions1
We consider a problem which can greatly enhance the areas of cursive script recognition and the recognition of printed character sequences. This problem involves recognizing words/strings by processing their noisy subsequences. Let X* be any unknown word from a finite dictionary H. Let U be any arbitrary subsequence of X*. We study the problem of estimating X* by processing Y, a noisy version o...
متن کاملThe Normalized String Editing Problem Revisited
Marzal and Vidal [8] recently considered the problem of computing the normalized edit distance between two strings, and reported experimental results which demonstrated the use of the measure to recognize handwritten characters. Their paper formulated the theoretical properties of the measure and developed two algorithms to compute it. In this short communication we shall demonstrate how this m...
متن کاملPattern Recognition of Strings with Substitutions, Insertions, Deletions and Generalized Transpositions1
We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straight...
متن کاملPattern Recognition of Strings Containing Traditional and Generalized Transposition Errors1
We study the problem of recognizing a string Y which is the noisy version of some unknown string X* chosen from a finite dictionary, H. The traditional case which has been extensively studied in the literature is the one in which Y contains substitution, insertion and deletion (SID) errors. Although some work has been done to extend the traditional set of edit operations to include the straight...
متن کاملSymbolic Channel Modelling for Noisy Channels Which Permit Arbitrary Noise Distributions
In this paper we present a new model for noisy channels which permit arbitrarily distributed substitution, deletion and insertion errors. Apart from its straightforward applications in string generation and recognition, the model also has potential applications in speech and unidimensional signal processing. The model is specified in terms of a noisy string generation technique. Let A be any fi...
متن کامل